VIROME: a standard operating procedure for analysis of viral metagenome sequences
نویسندگان
چکیده
One consistent finding among studies using shotgun metagenomics to analyze whole viral communities is that most viral sequences show no significant homology to known sequences. Thus, bioinformatic analyses based on sequence collections such as GenBank nr, which are largely comprised of sequences from known organisms, tend to ignore a majority of sequences within most shotgun viral metagenome libraries. Here we describe a bioinformatic pipeline, the Viral Informatics Resource for Metagenome Exploration (VIROME), that emphasizes the classification of viral metagenome sequences (predicted open-reading frames) based on homology search results against both known and environmental sequences. Functional and taxonomic information is derived from five annotated sequence databases which are linked to the UniRef 100 database. Environmental classifications are obtained from hits against a custom database, MetaGenomes On-Line, which contains 49 million predicted environmental peptides. Each predicted viral metagenomic ORF run through the VIROME pipeline is placed into one of seven ORF classes, thus, every sequence receives a meaningful annotation. Additionally, the pipeline includes quality control measures to remove contaminating and poor quality sequence and assesses the potential amount of cellular DNA contamination in a viral metagenome library by screening for rRNA genes. Access to the VIROME pipeline and analysis results are provided through a web-application interface that is dynamically linked to a relational back-end database. The VIROME web-application interface is designed to allow users flexibility in retrieving sequences (reads, ORFs, predicted peptides) and search results for focused secondary analyses.
منابع مشابه
Virome Assembly and Annotation: A Surprise in the Namib Desert
Sequencing, assembly, and annotation of environmental virome samples is challenging. Methodological biases and differences in species abundance result in fragmentary read coverage; sequence reconstruction is further complicated by the mosaic nature of viral genomes. In this paper, we focus on biocomputational aspects of virome analysis, emphasizing latent pitfalls in sequence annotation. Using ...
متن کاملDuck gut viral metagenome analysis captures snapshot of viral diversity
BACKGROUND Ducks (Anas platyrhynchos) an economically important waterfowl for meat, eggs and feathers; is also a natural reservoir for influenza A viruses. The emergence of novel viruses is attributed to the status of co-existence of multiple types and subtypes of viruses in the reservoir hosts. For effective prediction of future viral epidemic or pandemic an in-depth understanding of the virom...
متن کاملMetavir: a web server dedicated to virome analysis
SUMMARY Metavir is a web server dedicated to the analysis of viral metagenomes (viromes). In addition to classical approaches for analyzing metagenomes (general sequence characteristics, taxonomic composition), new tools developed specifically for viral sequence analysis make it possible to: (i) explore viral diversity through automatically constructed phylogenies for selected marker genes, (ii...
متن کاملRedefining Chronic Viral Infection
Viruses that cause chronic infection constitute a stable but little-recognized part of our metagenome: our virome. Ongoing immune responses hold these chronic viruses at bay while avoiding immunopathologic damage to persistently infected tissues. The immunologic imprint generated by these responses to our virome defines the normal immune system. The resulting dynamic but metastable equilibrium ...
متن کاملGenome signature-based dissection of human gut metagenomes to extract subliminal viral sequences
Bacterial viruses (bacteriophages) have a key role in shaping the development and functional outputs of host microbiomes. Although metagenomic approaches have greatly expanded our understanding of the prokaryotic virosphere, additional tools are required for the phage-oriented dissection of metagenomic data sets, and host-range affiliation of recovered sequences. Here we demonstrate the applica...
متن کامل